You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Changes in language over time


← Back to all posts
Jan 09 2013

Following up on my previous topic modeling post, I want to talk about one thing humanists actually do with topic models once they build them, most of the time: chart the topics over time. Since I think that, although Topic Modeling can be very useful, theres too little skepticism about the technique, Im venturing to provide it (even with, Im sure, a gross misunderstanding or two). More generally, the sort of mistakes temporal changes cause should call into question the complacency with which humanists tend to  topics in topic modeling as stable abstractions, and argue for a much greater attention to the granular words that make up a topic model.

May 10 2011

Before end-of-semester madness, I was looking at how shifts in vocabulary usage occur. In many cases, I found, vocabulary change doesnt happen evenly across across all authors. Instead, it can happen generationally; older people tend to use words at the rate that was common in their youth, and younger people anticipate future word patterns. An eighty-year-old in 1880 uses a world like outside more like a 40-year-old in 1840 than he does like a 40-year-old in 1880. The original post has a more detailed explanation.

Dec 13 2010

Im interested in the ways different words are tied together. Thats sort of the universal feature of this project, so figuring out ways to find them would be useful. I already looked at some ways of finding interesting words for scientific method, but that was in the context of the related words as an endpoint of the analysis. I want to be able to automatically generate linked words, as well. Im going to think through this staying on capitalist as the word of the day. Fair warning: this post is a rambler.

Dec 04 2010

This verges on unreflective datadumping: but because its easy and I think people might find it interesting, Im going to drop in some of my own charts for total word use in 30,000 books by the largest American publishers on the same terms for which the Times published Cohens charts of title word counts. Ive tossed in a couple extra words where it seems interestingincluding some alternate word-forms that tell a story, using a perl word-stemming algorithm I set up the other day that works fairly well. My charts run from 1830 (there just arent many American books from before, and even the data from the 30s is a little screwy) to 1922 (the date that digital history endsthank you, Sonny Bono.) In some cases, (that 1874 peak for science), the American and British trends are surprisingly close. Sometimes, they arent.

Dec 03 2010

So I just looked at patterns of commemoration for a few famous anniversaries. This is, for some people, kind of interestinghow does the publishing industry focus in on certain figures to create news or resurgences of interest in them?  I love the way we get excited about the civil war sesquicentennial now, or the Darwin/Lincoln year last year.

Nov 15 2010

Im going to keep looking at the list of isms, because a) theyre fun; and b) the methods we use on them can be used on any group of wordsfor example, ones that we find are highly tied to evolution. So, lets use them as a test case for one of the questions I started out with: how can we find similarities in the historical patterns of emergence and submergence of words?

Nov 14 2010

Heres a fun way of using this dataset to convey a lot of historical information. I took all the 414 words that end in ism in my database, and plotted them by the year in which they peaked,* with the size proportional to their use at peak. Im going to think about how to make it flashier, but its pretty interesting as it is. Sample below, and full chart after the break.

Nov 09 2010

I cant resist making a few more comments on that technologies graph that I laid out. Im going to add a few thousand more books to the counts overnight, so I wont make any new charts until tomorrow, but look at this one again.

Nov 08 2010

An anonymous correspondent says:

Nov 07 2010

Lets start with just some of the basic wordcount results. Dan Cohen posted some similar things for the Victorian period on his blog, and used the numbers mostly to test hypotheses about change over time. I can give you a lot more like that (I confirmed for someone, though not as neatly as hed probably like, that business became a much more prevalent word through the 19C). But as Cohen implies, such charts can be cooler than they are illuminating.